{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Symmetrize\n",
    "\n",
    "In this notebook, we will use the _symmetrize_ function to create bi-directional edges in an undirected graph\n",
    "\n",
    "Notebook Credits\n",
    "* Original Authors: Bradley Rees and James Wyles\n",
    "* Created:   08/13/2019\n",
    "* Updated:   03/02/2020\n",
    "\n",
    "RAPIDS Versions: 0.13    \n",
    "\n",
    "Test Hardware\n",
    "\n",
    "* GV100 32G, CUDA 10.2\n",
    "\n",
    "\n",
    "## Introduction\n",
    "In many cases, an Undirected graph is saved as a single edge between vertex pairs.  That saves a lot of space in the data file.  However, in order to process that data in cuGraph, there needs to be an edge in each direction for undirected.  Converting from a single edge to two edges, one in each direction, is called symmetrization.  \n",
    "\n",
    "To symmerize an edge list (COO data) use:<br>\n",
    "\n",
    "**cugraph.symmetrize(source, destination, value)**\n",
    "* __source__: cudf.Series\n",
    "* __destination__: cudf.Series\n",
    "* __value__: cudf.Series\n",
    "\n",
    "\n",
    "Returns:\n",
    "* __triplet__: three variables are returned:\n",
    "    * __source__: cudf.Series\n",
    "    * __destination__: cudf.Series\n",
    "    * __value__: cudf.Series\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Test Data\n",
    "We will be using an undirected unsymmetrized version of the Zachary Karate club dataset.  The result of symmetrization shopuld be a dataset equal to to the version used in the PageRank notebook.\n",
    "\n",
    "*W. W. Zachary, An information flow model for conflict and fission in small groups, Journal of\n",
    "Anthropological Research 33, 452-473 (1977).*\n",
    "\n",
    "\n",
    "![Karate Club](../img/zachary_black_lines.png)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import needed libraries\n",
    "import cugraph\n",
    "import cudf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read the unsymmetrized data  \n",
    "unsym_data ='../data/karate_undirected.csv'\n",
    "gdf = cudf.read_csv(unsym_data, names=[\"src\", \"dst\"], delimiter='\\t', dtype=[\"int32\", \"int32\"] )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# load the full symmetrized dataset for comparison\n",
    "datafile='../data/karate-data.csv'\n",
    "test_gdf = cudf.read_csv(datafile, names=[\"src\", \"dst\"], delimiter='\\t', dtype=[\"int32\", \"int32\"] )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unsymmetrized Graph\n",
      "\tNumber of Edges: 78\n",
      "Baseline Graph\n",
      "\tNumber of Edges: 156\n"
     ]
    }
   ],
   "source": [
    "print(\"Unsymmetrized Graph\")\n",
    "print(\"\\tNumber of Edges: \" + str(len(gdf)))\n",
    "print(\"Baseline Graph\")\n",
    "print(\"\\tNumber of Edges: \" + str(len(test_gdf)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "_Since the unsymmetrized graph only has one edge between vertices, that underlying code treats that as a directed graph_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "G = cugraph.Graph()\n",
    "G.from_cudf_edgelist(gdf, source='src', destination='dst')\n",
    "gdf_page = cugraph.pagerank(G)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>vertex</th>\n",
       "      <th>pagerank</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>34</td>\n",
       "      <td>0.100917</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    vertex  pagerank\n",
       "33      34  0.100917"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# best PR score is\n",
    "m = gdf_page['pagerank'].max()\n",
    "df = gdf_page.query('pagerank == @m')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Now Symmetrize the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "sdf = cugraph.symmetrize_df(gdf, 'src', 'dst')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Unsymmetrized Graph\n",
      "\tNumber of Edges: 78\n",
      "Symmetrized Graph\n",
      "\tNumber of Edges: 156\n",
      "Baseline Graph\n",
      "\tNumber of Edges: 156\n"
     ]
    }
   ],
   "source": [
    "print(\"Unsymmetrized Graph\")\n",
    "print(\"\\tNumber of Edges: \" + str(len(gdf)))\n",
    "print(\"Symmetrized Graph\")\n",
    "print(\"\\tNumber of Edges: \" + str(len(sdf)))\n",
    "print(\"Baseline Graph\")\n",
    "print(\"\\tNumber of Edges: \" + str(len(test_gdf)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "Copyright (c) 2019, NVIDIA CORPORATION.\n",
    "\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\");  you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n",
    "___"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "cugraph_dev",
   "language": "python",
   "name": "cugraph_dev"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
